Skip to content

HTTP request callback support#1689

Merged
SteveSandersonMS merged 56 commits into
mainfrom
stevesandersonms/llm-inference-callbacks
Jun 23, 2026
Merged

HTTP request callback support#1689
SteveSandersonMS merged 56 commits into
mainfrom
stevesandersonms/llm-inference-callbacks

Conversation

@SteveSandersonMS

@SteveSandersonMS SteveSandersonMS commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR adds SDK support for intercepting HTTP requests (for LLM inference or anything else) and handling them in user code across all six SDK languages: Node.js, .NET, Python, Go, Rust, and Java.

Consumers register one client-global CopilotRequestHandler (constructed once, no args). The runtime invokes it over JSON-RPC (llmInference.*) whenever it would otherwise issue a model-layer HTTP or WebSocket request — for both BYOK and CAPI — fully replacing the outbound call. A handler that overrides nothing is a transparent pass-through.

It includes the full feature work on this branch:

  • wire up LLM inference provider registration and generated RPC types
  • add the raw chunked callback protocol for outbound inference requests and responses
  • cover plain HTTP, streaming responses, cancellation (runtime- and consumer-initiated), error mapping, session-id threading, and WebSocket transport
  • port the feature to every SDK language with idiomatic HTTP types
  • keep the public callback surface to a single CopilotRequestHandler model with forwarding helpers

What changed

Shared protocol and plumbing

  • add generated RPC / session event types needed for LLM inference callbacks across all SDK surfaces
  • add SDK-side registration for a process-global LLM inference provider (llmInference.setProvider)
  • route outbound inference requests through the callback bridge instead of requiring provider-specific hooks

Per-language ports

Each language exposes the same CopilotRequestHandler model, mapped onto the most canonical HTTP representation available in that ecosystem:

Language HTTP request/response type WebSocket type
Node.js Request / Response (Fetch) per-connection handler
.NET HttpRequestMessage / HttpResponseMessage per-connection handler
Python httpx request/response per-connection handler
Go *http.Request / *http.Response per-connection handler
Rust http::Request / http::Response per-connection handler
Java java.net.http HttpRequest / HttpResponse per-connection handler

All ports thread cancellation and session id through the request context, and provide a CopilotWebSocketForwarder (or language equivalent) for the common mutate-and-forward case.

API shape

  • collapse HTTP interception to a single send hook (SendRequestAsync / sendRequest / send_request / language equivalents)
  • expose WebSocket interception through OpenWebSocketAsync / openWebSocket / open_websocket, returning a per-connection handler object
  • allow consumers to mutate, drop, duplicate, or fully replace request/response messages while keeping the common forwarding case straightforward

Usage examples

C#

using GitHub.Copilot;
using System.Net.Http;

sealed class MyHandler : CopilotRequestHandler
{
    protected override async Task<HttpResponseMessage> SendRequestAsync(
        HttpRequestMessage request,
        CopilotRequestContext ctx)
    {
        request.Headers.Add("X-Debug-Session", ctx.SessionId ?? "none");
        return await base.SendRequestAsync(request, ctx);
    }

    protected override Task<CopilotWebSocketHandler> OpenWebSocketAsync(CopilotRequestContext ctx)
        => Task.FromResult<CopilotWebSocketHandler>(new MyForwardingSocket(ctx));
}

sealed class MyForwardingSocket : CopilotWebSocketForwarder
{
    public MyForwardingSocket(CopilotRequestContext ctx)
        : base(ctx)
    {
    }

    public override Task SendRequestMessageAsync(CopilotWebSocketMessage message)
    {
        var text = message.GetText().Replace("model-A", "model-B");
        return base.SendRequestMessageAsync(CopilotWebSocketMessage.Text(text));
    }
}

Register the handler when constructing the client:

var client = new CopilotClient(new CopilotClientOptions { RequestHandler = new MyHandler() });

Node.js

import {
    CopilotRequestContext,
    CopilotRequestHandler,
    CopilotWebSocketForwarder,
} from "@github/copilot";

class MyHandler extends CopilotRequestHandler {
    protected override async sendRequest(request: Request, ctx: CopilotRequestContext): Promise<Response> {
        const headers = new Headers(request.headers);
        headers.set("x-debug-session", ctx.sessionId ?? "none");

        return super.sendRequest(new Request(request, { headers }), ctx);
    }

    protected override async openWebSocket(ctx: CopilotRequestContext) {
        return new MyForwardingSocket(ctx);
    }
}

class MyForwardingSocket extends CopilotWebSocketForwarder {
    override sendRequestMessage(data: string | Uint8Array) {
        if (typeof data === "string") {
            return super.sendRequestMessage(data.replace("model-A", "model-B"));
        }
        return super.sendRequestMessage(data);
    }
}

Java

import com.github.copilot.CopilotRequestContext;
import com.github.copilot.CopilotRequestHandler;
import com.github.copilot.CopilotWebSocketHandler;
import com.github.copilot.CopilotWebSocketMessage;
import com.github.copilot.CopilotWebSocketForwarder;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.io.InputStream;
import java.nio.charset.StandardCharsets;

final class MyHandler extends CopilotRequestHandler {
    @Override
    protected HttpResponse<InputStream> sendRequest(HttpRequest request, CopilotRequestContext ctx) throws Exception {
        HttpRequest mutated = HttpRequest.newBuilder(request, (n, v) -> true)
                .header("X-Debug-Session", ctx.sessionId() == null ? "none" : ctx.sessionId())
                .build();
        return super.sendRequest(mutated, ctx);
    }

    @Override
    protected CopilotWebSocketHandler openWebSocket(CopilotRequestContext ctx) {
        return new CopilotWebSocketForwarder(ctx) {
            @Override
            public void sendRequestMessage(CopilotWebSocketMessage message) throws Exception {
                String text = message.text().replace("model-A", "model-B");
                super.sendRequestMessage(CopilotWebSocketMessage.text(text));
            }
        };
    }
}

Register the handler when constructing the client:

CopilotClient client = new CopilotClient(
    new CopilotClientOptions().setRequestHandler(new MyHandler()));

Tests

Each language adds e2e coverage (mirroring a shared reference suite) for:

  • callback provider registration
  • HTTP inference interception
  • streaming inference interception
  • error mapping
  • runtime-initiated and consumer-initiated cancellation
  • session-id threading
  • WebSocket callback handling
  • an idiomatic handler test exercising mutate-and-forward over both HTTP and WebSocket

Resolves github/copilot-sdk-internal#88

@SteveSandersonMS SteveSandersonMS changed the title Simplify LLM inference callback handlers LLM inference callback support Jun 16, 2026
@SteveSandersonMS SteveSandersonMS force-pushed the stevesandersonms/llm-inference-callbacks branch from 815bbd0 to 7bc95c0 Compare June 19, 2026 15:00
@github-actions

This comment has been minimized.

Comment thread dotnet/src/LlmInferenceProvider.cs Fixed
Comment thread dotnet/src/LlmInferenceProvider.cs Fixed
Comment thread dotnet/src/LlmInferenceProvider.cs Fixed
Comment thread dotnet/src/CopilotRequestHandler.cs Fixed
Comment thread dotnet/src/CopilotRequestHandler.cs Fixed
Comment thread dotnet/src/CopilotRequestHandler.cs Fixed
Comment thread dotnet/src/CopilotRequestHandler.cs Fixed
Comment thread dotnet/src/LlmRequestHandler.cs Fixed
Comment thread dotnet/src/CopilotRequestHandler.cs Fixed

@github-advanced-security github-advanced-security AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Comment thread dotnet/src/Types.cs Outdated
Comment thread dotnet/src/GitHub.Copilot.SDK.csproj Outdated
Comment thread dotnet/src/LlmInferenceProvider.cs Outdated
Comment thread dotnet/src/LlmInferenceProvider.cs Outdated
Comment thread dotnet/src/LlmInferenceProvider.cs Outdated
stevesa and others added 18 commits June 22, 2026 14:38
Adds an opt-in llmInference config to CopilotClientOptions that lets SDK consumers register a callback the runtime invokes whenever it would otherwise issue an outbound non-streaming LLM HTTP request itself.

v1 scope is TS-only/non-streaming, mirroring the runtime support added in github/copilot-agent-runtime. Streaming SSE and WebSocket transports are out of scope for v1 and continue to bypass the callback.

- New `LlmInferenceProvider` interface with a single `onLlmRequest` method.

- `createLlmInferenceAdapter` converts the provider into the wire-shape `LlmInferenceHandler` consumed by the RPC dispatcher.

- Client wiring: `llmInference.setProvider` is sent on connect; per-session adapter is attached alongside the existing sessionFs hook.

- New `llm_inference.e2e.test.ts` exercises the full RPC round-trip against the runtime.

Resolves github/copilot-sdk-internal#88

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Matches the runtime move of `llmInference.httpRequest` out of the
session-scoped client API and onto a new `clientGlobal` schema root.

- Codegen emits a new `registerClientGlobalApiHandlers` alongside the
  existing `registerClientSessionApiHandlers`. Handlers passed to it
  are dispatched directly (no per-session `getHandlers` callback) and
  carry no implicit sessionId — sessionId, when present, is just a
  payload field on the call.
- `CopilotClient` now constructs the LLM inference adapter once and
  registers it process-wide via `registerClientGlobalApiHandlers`
  during connection setup. The per-session `setupLlmInference` path
  and the `SessionConfigBase.createLlmInferenceProvider` override are
  removed — there is no longer any per-session notion of which provider
  to use.
- `LlmInferenceConfig.createLlmInferenceProvider` is now `() =>
  LlmInferenceProvider` (was `(session) => ...`).
- `LlmInferenceRequest` exposes the new optional `sessionId` field
  so consumers can correlate requests with a runtime session when one
  is in scope.

E2E test updated to verify the global registration works and that
sessionId is populated on in-session traffic.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
With the Rust runtime intercept chokepoint in place, every model-layer
HTTP request - including /models and /models/session - is now dispatched
through the SDK callback. Update the e2e test to:

- Stub realistic responses for non-streaming model catalog and session
  endpoints (so the runtime can proceed past model resolution).
- Hard-assert the catalog request is intercepted (no more 'either-or'
  fallback for the pre-rust-intercept state).

Streaming inference requests still pass through to the recorded CAPI
proxy; a fully-mocked end-to-end inference test will land alongside
the streaming-intercept commit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extends LlmInferenceProvider with an optional onLlmStreamRequest method
that returns a response head synchronously and pushes body chunks via the
provided sink. The adapter implements the generated httpStreamStart RPC
method and forwards chunks back to the runtime via the typed server-RPC
client (llmInference.streamChunk / streamEnd).

Adds a fully-mocked e2e test (test/e2e/llm_inference_stream.e2e.test.ts)
that drives a complete user->assistant turn through the callback alone:
the runtime hits the callback for /models, /models/session, and the chat
completion itself, the assistant text returned to the SDK consumer is the
synthetic text supplied by the stub.

- nodejs/src/llmInferenceProvider.ts: LlmInferenceStreamSink, onLlmStreamRequest, httpStreamStart adapter
- nodejs/src/client.ts: pass a lazy server-RPC accessor into the adapter
- nodejs/src/index.ts: re-export new types
- nodejs/test/e2e/llm_inference_stream.e2e.test.ts: full-mock e2e
- nodejs/src/generated/*, python/*, go/*, rust/*: codegen for new RPC methods
- dotnet/src/Generated/*: codegen for new RPC methods

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds test/e2e/llm_inference_errors.e2e.test.ts that wires a callback whose
inference handler throws a synthetic transport error and verifies the
failure surfaces to the SDK consumer (the call does not hang and any error
caught is non-empty). Confirms the runtime's existing retry / error
reporting path handles callback-side failures the same way it handles real
transport failures.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Mirrors the runtime-side cleanup: the callback wire no longer carries providerType / endpointKind / wireApi / transport / modelId. Adapter stops forwarding the field, e2e tests filter by URL instead of metadata, and the missing LlmInferenceStreamSink / LlmInferenceStreamStartResponse re-exports in types.ts are added so index.ts type-checks cleanly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
[Phase 3] Realign the Node SDK with the runtime's new four-method chunk
protocol. One unified provider callback:

  interface LlmInferenceProvider {
    onLlmRequest(req: LlmInferenceRequest): Promise<void>;
  }

LlmInferenceRequest exposes:
  * url / method / headers / sessionId
  * requestBody: AsyncIterable<Uint8Array>     // body delivered as chunks
  * responseBody: LlmInferenceResponseSink     // start/write/end/error

The sink enforces start -> 0..N writes -> exactly one of end/error and
maps each call to the corresponding httpResponseStart / httpResponseChunk
RPC. createLlmInferenceAdapter maintains a per-requestId state map; the
generated httpRequestStart handler registers state synchronously and
fires onLlmRequest in the background, so the runtime's RPC reply isn't
gated on consumer I/O.

The body queue iterator now latches a 'done' flag so a consumer that
calls .next() again after end:true gets done back instead of blocking
forever waiting for chunks the runtime will never send.

Removes the previous onLlmRequest + onLlmStreamRequest split and the
LlmInferenceResponse / LlmInferenceStreamSink /
LlmInferenceStreamStartResponse public types. All three e2e tests
rewritten against the unified callback (one of them URL-dispatches
/responses -> SSE and /chat/completions -> buffered JSON; the consumer
can also branch on whether the request body has stream:true).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Phase 4.1: expose an AbortSignal on the request envelope, abort it on a
cancel chunk from the runtime, and map consumer-side aborts to a 499 +
error{code:cancelled} response. Adds the cancellation e2e test.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add an e2e test asserting that when the SDK consumer signals a terminal
error via responseBody.error({ code: 'cancelled' }) the runtime surfaces
it faithfully as a request failure rather than hanging. Completes the
consumer->runtime direction of Phase 4.1.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Surface the new `transport` discriminator on `LlmInferenceRequest` so
consumers can tell an `"http"` request (plain HTTP / SSE) from a
`"websocket"` one (full-duplex: each request-body chunk is one inbound
WS message, each response-body write one outbound message). The adapter
threads `params.transport` through, defaulting to `"http"`.

Regenerate rpc.ts against the runtime schema for the new field and add
an e2e test exercising the full-duplex path: the fake model advertises
`ws:/responses`, the runtime's WebSocket flag is enabled via env var,
and the consumer pumps `/responses` events back per inbound message.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Friendly product-code starting point for SDK consumers who want to
observe or mutate LLM inference requests/responses by overriding virtual
methods on a base class. Implements LlmInferenceProvider, so an instance
can be returned directly from createLlmInferenceProvider.

Default behaviour is a transparent pass-through: each request is
forwarded to its original URL via the WHATWG fetch global (HTTP) or
WebSocket global (WebSocket), and the upstream response is streamed back
unchanged. The same subclass handles both transports - onLlmRequest
dispatches on req.transport.

Virtual hooks:
- HTTP: transformRequest, forward, transformResponse
- WebSocket: forwardWebSocket, transformRequestMessage,
  transformResponseMessage

E2e test (llm_inference_handler.e2e.test.ts) demonstrates a single
TestHandler subclass servicing both an HTTP turn (single-shot title
generation) and a WebSocket turn (main agent turn) against a per-test
in-process http+ws upstream that speaks the real CAPI shapes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Review fixes for github/copilot-sdk-internal#88 (Node SDK side).

- Honor the runtime's accepted=false ack: the response sink now aborts the
  provider's signal and stops emitting once the runtime drops the request (I1).
- Add a staging backstop in the adapter so a body chunk that arrives before
  its start frame is buffered and replayed rather than silently dropped (B1).
- Run the WebSocket request/response pumps concurrently and race their
  terminal states, so an upstream-closes-first (or runtime-cancels-first)
  case tears the other side down instead of hanging on a parked iterator (B2).
- Buffer inbound WS frames in wrapGlobalWebSocket until onMessage is
  registered so the first frames of a fast upstream aren't dropped.
- Collapse the dead send branch, hoist TextEncoder/TextDecoder singletons,
  and correct the LlmWebSocketUpstream.onClose contract doc.
- Update CopilotClientOptions.llmInference docs: streaming SSE and WebSocket
  are intercepted, not bypassed (I6).
- Add unit tests: chunk-before-start staging, accepted=false abort, WS
  upstream-close-first finalisation, and WS upstream-error propagation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Drives a CAPI session and a BYOK (openai/responses) session entirely
through the LLM inference callback — the consumer fabricates every
model-layer response, so the CAPI record/replay proxy is never the
inference endpoint. Asserts each in-session inference request carries
req.sessionId === session.sessionId and that the two session ids differ.

The mock branches /responses on the request stream flag: BYOK turns whose
config-derived model does not advertise streaming issue a buffered
(non-streaming) /responses request expecting a single JSON response
object, whereas the CAPI turn streams via SSE. This mirrors real upstream
behaviour and confirms the callback transport faithfully delivers both
shapes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Mirrors the TypeScript LLM inference callback feature in the .NET SDK so
consumers can observe/mutate the model-layer HTTP/WebSocket requests the
runtime issues (CAPI and BYOK), with the runtime session id threaded into
each callback.

- scripts/codegen/csharp.ts: emit the clientGlobal handler interface +
  registration so Rpc.cs gains the llmInference handler surface.
- LlmInferenceProvider.cs: low-level ILlmInferenceProvider API + adapter
  (request staging, response sink state machine) behind an internal
  ILlmInferenceResponseChannel seam for unit testing.
- LlmRequestHandler.cs: idiomatic pass-through base class mapping to
  HttpRequestMessage/HttpResponseMessage and ClientWebSocket, with virtual
  transform/forward hooks for both transports.
- Types.cs/Client.cs: wire LlmInferenceConfig into the client and register
  the provider on start.
- Tests: factored unit-test infra (recording channel/sink, inline provider,
  frame builders) with adapter + handler tests, plus CAPI+BYOK e2e tests
  asserting the session id reaches the callback. e2e provider emits raw JSON
  (reflection-free STJ) and serves all model-layer traffic off-network.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Hide the redundant low-level provider interface and adapter from the public
surface in both SDKs; the sole public extension point is now the
LlmRequestHandler base class. Replace the LlmInferenceConfig provider factory
with a direct handler instance (the provider is client-global, constructed
once with no args).

.NET: ILlmInferenceProvider + the LlmInferenceRequest/ResponseInit/ResponseSink
DTOs become internal; LlmRequestHandler implements the interface explicitly so
OnLlmRequestAsync leaves its public surface. LlmInferenceConfig.Handler replaces
the Func<LlmRequestHandler> factory.

TS: stop exporting LlmInferenceProvider and createLlmInferenceAdapter from
index.ts; LlmInferenceConfig.handler replaces createLlmInferenceProvider. The
request/sink DTOs stay exported as onLlmRequest's contract (TS lacks explicit
interface implementation). E2E providers become LlmRequestHandler subclasses
overriding onLlmRequest.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Collapse the HTTP callback seam to SendRequest/sendRequest, replace websocket hooks with per-connection handlers, and update tests to use the forwarding handler model.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Port the LLM inference callback feature to the Python SDK, mirroring the
existing Node.js and .NET implementations. Consumers subclass
`LlmRequestHandler` and override `send_request` (idiomatic httpx) for
HTTP or `open_web_socket` (websockets) for the WebSocket transport; both
default to transparent pass-through. Wired through `LlmInferenceConfig` on
the client, registered on the `clientGlobal.llmInference` scope.

Adds the low-level provider/adapter, the httpx-based handler base class,
client wiring, public exports, and httpx as a core dependency. Extends the
Python codegen to emit clientGlobal handler registration and regenerates the
generated RPC bindings.

Includes 8 e2e test files (10 tests) mirroring the Node.js suite — round
trip, session-id threading (CAPI + BYOK), streaming SSE, error mapping,
runtime cancel, consumer cancel, WebSocket transport, and the idiomatic
handler against a real local HTTP+WebSocket upstream. All pass off-network.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Mirror the existing Node/.NET/Python LLM inference callback support in the
Go SDK. Consumers register an LlmInferenceProvider (or the idiomatic
LlmRequestHandler over net/http + coder/websocket) via
ClientOptions.LlmInference; the runtime routes every model-layer HTTP and
WebSocket request through it for both CAPI and BYOK sessions.

- Codegen (scripts/codegen/go.ts) now emits the clientGlobal handler
  registration, regenerating go/rpc/zrpc.go.
- New low-level provider types + adapter (llm_inference_provider.go) and the
  idiomatic forwarding handler (llm_request_handler.go).
- Wire LlmInferenceConfig into ClientOptions and the connect/start paths.
- 8 off-network e2e scenarios mirroring the other SDKs (basic, session id,
  stream, errors, cancel, consumer cancel, websocket, handler).

Also fixes a pre-existing Go e2e compile break (AttachmentBlob.Data became
*string in the Rust contract regen baseline) that blocked the e2e package.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

This comment has been minimized.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generated by SDK Consistency Review Agent for issue #1689 · sonnet46 3M

Comment thread go/copilot_request_handler.go Outdated

/// A buffered HTTP request handed to [`CopilotRequestHandler::send_request`].
#[non_exhaustive]
pub struct CopilotHttpRequest {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description vs. implementation mismatch — custom types, not http::Request/http::Response

The PR description's comparison table lists Rust's HTTP type as http::Request / http::Response (the standard http crate types). The actual implementation uses SDK-defined CopilotHttpRequest / CopilotHttpResponse wrappers.

Every other SDK hands consumers the ecosystem's native HTTP request/response types so they compose naturally:

SDK Request type Response type
Node.js Request (Fetch API) Response
Python httpx.Request httpx.Response
.NET HttpRequestMessage HttpResponseMessage
Go *http.Request (via RoundTripper) *http.Response
Java HttpRequest (java.net.http) HttpResponse<InputStream>
Rust CopilotHttpRequest (custom) CopilotHttpResponse (custom)

Using custom types is a reasonable trade-off (avoids the B: Body generic parameter dance on http::Request<B>), but it does mean Rust consumers can't pass the intercepted request directly to a reqwest::Client or any other tower::Service<http::Request> middleware. Two small suggestions:

  1. Update the PR description table to say "SDK-specific CopilotHttpRequest / CopilotHttpResponse" rather than the standard crate types.
  2. Consider adding a From<CopilotHttpRequest> for reqwest::Request conversion (or at least the reverse) so consumers who want to forward to a custom backend have a clean path without reconstructing the whole request manually.

The handler rename shortened a javadoc line, so Spotless' wrapping no longer
matched. Ran spotless:apply (javadoc reflow only, no semantic change).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

Cross-SDK Consistency Review ✅

This PR adds HTTP/WebSocket request callback support to all six SDK implementations simultaneously. Here's the cross-language consistency assessment:

API surface — all SDKs aligned

Concept Node.js Python Go .NET Java Rust
Handler class/type CopilotRequestHandler CopilotRequestHandler CopilotRequestHandler CopilotRequestHandler CopilotRequestHandler CopilotRequestHandler trait
HTTP hook sendRequest(Request, ctx) send_request(httpx.Request, ctx) Transport http.RoundTripper SendRequestAsync(HttpRequestMessage, ctx) sendRequest(HttpRequest, ctx) send_request(CopilotHttpRequest, ctx)
WebSocket hook openWebSocket(ctx) open_websocket(ctx) OpenWebSocket func(ctx) OpenWebSocketAsync(ctx) openWebSocket(ctx) open_websocket(ctx, response)
Options field requestHandler request_handler RequestHandler RequestHandler requestHandler request_handler
Forwarder CopilotWebSocketForwarder CopilotWebSocketForwarder CopilotWebSocketForwarder CopilotWebSocketForwarder CopilotWebSocketForwarder CopilotWebSocketForwarder

E2E test coverage — all SDKs present

All six SDKs ship e2e tests covering: handler error surfacing, runtime-driven cancellation, HTTP+WebSocket forwarding, and session ID threading.

Intentional design deviations (documented and appropriate)

  1. Go uses a struct with function fields (Transport http.RoundTripper, OpenWebSocket func(...)) instead of class inheritance — idiomatic Go (no virtual methods). HTTP body arrives in CopilotRequestContext via a channel, then flows through the RoundTripper as a normal *http.Request.

  2. Rust passes the CopilotWebSocketResponse writer as a parameter to open_websocket rather than exposing send_response_message on the handler class — this is documented inline as a known difference and is the idiomatic Rust ownership model. Go follows the same injection pattern via Open(ctx, WebSocketResponseWriter).

  3. Go's CopilotWebSocketForwarder exposes mutation via hook function fields (OnSendRequestMessage, OnSendResponseMessage) rather than method overrides (as in OO languages) — idiomatic Go composition over inheritance.

  4. Cancellation uses each language's native mechanism: AbortSignal (Node.js), asyncio.Event (Python), context.Context (Go), CancellationToken (Rust/.NET), CompletableFuture<Void> (Java) — all correct and idiomatic.

No consistency issues found

All differences across SDKs are deliberate language-idiomatic choices. The feature is implemented with good parity: same semantics, same public API shape, same test scenarios in every language.

Generated by SDK Consistency Review Agent for issue #1689 · sonnet46 2.6M ·

SteveSandersonMS and others added 3 commits June 23, 2026 10:21
The body channel is internal framework plumbing: the adapter drains it for
HTTP requests and pumps it to CopilotWebSocketHandler.SendRequestMessage for
WebSocket requests. It was exported only because Go exports by capitalisation;
no other SDK surfaces the body channel on the request context. A consumer
reading it directly (e.g. via RequestContextFrom) would race the adapter's
pump goroutine and lose frames. Lowercase it to body to match the other SDKs
and keep it internal to the adapter layer.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
sessionId() returns null when the request was issued outside any session,
but the nullability was only documented in the Javadoc. Every other SDK makes
this explicit in the type system (TypeScript string?, Python str | None,
.NET string?, Rust Option<String>). Add @nullable from the already-declared
spotbugs-annotations dependency on the field, constructor parameter, and getter
so SpotBugs and IDEs surface null-safety warnings, aligning Java with the other
SDKs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
forceStop()/stop() dispose the JSON-RPC connection and destroy the underlying
socket. If vscode-jsonrpc still has an in-flight write at that moment — most
commonly the auto-generated response to a server->client request (a tool, hook,
userInput, or LLM-inference handler that resolved just before teardown) — the
write rejects with ERR_STREAM_DESTROYED. That response write is internal to
vscode-jsonrpc and awaited by nobody, so the rejection surfaces as an unhandled
promise rejection (observed intermittently in the e2e suite, originating from
pending_work_resume's cold-resume forceStop path, but reproducible by any
consumer that forceStop()s while a server->client request is in flight).

Wrap the StreamMessageWriter so write failures can be swallowed, but only once
a teardown-in-progress flag is set immediately before connection.dispose(). The
writer still fires its error event (forwarded to MessageConnection.onError) and
dispose() still rejects pending requests, so no signal is lost. Outside teardown
the flag stays false, so write failures propagate normally and in-flight
requests continue to fail fast.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@SteveSandersonMS SteveSandersonMS merged commit 0667a46 into main Jun 23, 2026
40 of 41 checks passed
@SteveSandersonMS SteveSandersonMS deleted the stevesandersonms/llm-inference-callbacks branch June 23, 2026 09:58
SteveSandersonMS added a commit that referenced this pull request Jun 23, 2026
Replace the real local HTTP listener (CapturingProvider) with the client-global
CopilotRequestHandler interceptor (PR #1689). The handler captures the
Authorization header the runtime applies after resolving getBearerToken over the
providerToken.acquire RPC, and answers with a synthetic 404 entirely in-process:
no socket binding, no listen/close lifecycle, no waitForRequests polling.

Distinct fake .invalid hosts per provider let the per-provider dispatch test
assert routing by host. The handler is installed once per fixture via
createSdkTestContext({ copilotClientOptions: { requestHandler } }) and reset
between tests; non-BYOK (CAPI bootstrap) requests pass through via super.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
public string SessionId { get; set; } = string.Empty;
}

/// <summary>Acknowledgement. Returning successfully simply means the SDK accepted the start frame; it does not imply the request will succeed.</summary>

@stephentoub stephentoub Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused as to these Rpc.cs types showing up in this PR. Some were already present from when 1.0.64-1 was pulled in with #1739; is this adding more based on a private schema that we haven't ingested yet?

Comment thread dotnet/src/Client.cs

private async Task CleanupConnectionAsync(Connection ctx, List<Exception>? errors, bool gracefulRuntimeShutdown)
{
var runtimeShutdownCompleted = false;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not understanding why all this runtimeShutdownCompleted related code was removed. ShutdownAsync can throw an exception, which is then logged and eaten by the corresponding catch block. So we can be in CleanupCliProcessAsync even if shutdown async didn't complete successfully. What am I missing?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline - we confirmed the new force shutdown logic is needed.

Comment thread dotnet/src/Client.cs
// deliberately keeps its JSON-RPC server alive to send the
// response and never self-exits. Waiting for a self-exit that
// will never come just wastes time, so terminate the child
// immediately and only wait to reap it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the argument is that in the graceful case it will have already cleaned up and in the non-graceful case we shouldn't wait anyway?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline - we confirmed the new force shutdown logic is needed.

Comment on lines +82 to +85
public static CopilotWebSocketMessage Text(string text) => new(Encoding.UTF8.GetBytes(text), isBinary: false);

/// <summary>Creates a binary message from raw bytes.</summary>
public static CopilotWebSocketMessage Binary(ReadOnlyMemory<byte> data) => new(data, isBinary: true);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are unusual .NET method names. They'd be better as e.g. FromText / FromBinary.

public bool IsBinary { get; } = isBinary;

/// <summary>Decodes the payload as UTF-8 text.</summary>
public string GetText() => Encoding.UTF8.GetString(Data.ToArray());

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need to ToArray this. On .NET Core this can just be Encoding.UTF8.GetString(Data.Span). And we can polyfill that on .NET Framework (if we haven't already).

Comment on lines +82 to +85
public static CopilotWebSocketMessage Text(string text) => new(Encoding.UTF8.GetBytes(text), isBinary: false);

/// <summary>Creates a binary message from raw bytes.</summary>
public static CopilotWebSocketMessage Binary(ReadOnlyMemory<byte> data) => new(data, isBinary: true);

@stephentoub stephentoub Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to have an optional isBinary parameter here, so that if someone already has the UTF8 bytes, they don't need to allocate a string just to roundtrip it back.

EDIT: I see now there's a primary ctor that enables that. Begs the question why both that and this FromBinary method exist.

public Exception? Error { get; init; }

/// <summary>Shared normal-closure instance.</summary>
public static CopilotWebSocketCloseStatus NormalClosure { get; } = new();

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.NET has System.Net.WebSockets.WebSocketCloseStatus. Maybe there's a way to incorporate that.

/// forwards by default.
/// </summary>
[Experimental(Diagnostics.Experimental)]
public abstract class CopilotWebSocketHandler : IAsyncDisposable

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we're almost reinventing the abstract System.Net.WebSockets.WebSocket class?

await socket.ConnectAsync(ToWebSocketUri(_url), Context.CancellationToken).ConfigureAwait(false);
_upstream = socket;
_pumpCts = CancellationTokenSource.CreateLinkedTokenSource(Context.CancellationToken);
_responsePump = Task.Run(() => PumpResponsesAsync(_pumpCts.Token), _pumpCts.Token);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably want to pass CancellationToken.None as the argument to Run here. Otherwise, if Context.CancellationToken is canceled before this task starts executing, it'll be canceled, PumpResponsesAsync will never be called, and its cleanup code will never run.


var type = message.IsBinary ? WebSocketMessageType.Binary : WebSocketMessageType.Text;
return _upstream.SendAsync(
new ArraySegment<byte>(message.Data.ToArray()),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be using the overload that takes a ReadOnlyMemory<data> instead of an ArraySegment<byte>... if we haven't already, you'll need to polyfill that on .NET Framework. That'll avoid the need to allocate/copy all the data on each send.


private static async Task<CopilotWebSocketMessage?> ReceiveMessageAsync(WebSocket socket, CancellationToken cancellationToken)
{
var buffer = new byte[16 * 1024];

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should rent this from ArrayPool rather than allocating it on each receive. This will be super expensive.

{
try
{
result = await socket.ReceiveAsync(new ArraySegment<byte>(buffer), cancellationToken).ConfigureAwait(false);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with send, this should use the overload that takes Memory<byte>.

[Experimental(Diagnostics.Experimental)]
public class CopilotRequestHandler
{
private static readonly HttpClient s_sharedHttpClient = new();

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a singleton default is good, but we should have a ctor that lets someone supply their own and only fall back to the shared singleton if one isn't provided.


private static async Task<HttpRequestMessage> BuildHttpRequestAsync(LlmInferenceExchange exchange)
{
var method = new HttpMethod(exchange.Method.ToUpperInvariant());

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ToUpperInvariant necessary?

Comment on lines +502 to +506
#if NETSTANDARD2_0
using var stream = await response.Content.ReadAsStreamAsync().ConfigureAwait(false);
#else
using var stream = await response.Content.ReadAsStreamAsync(ct).ConfigureAwait(false);
#endif

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have polyfills in the repo; I thought we already had one for ReadAsStreamAsync(CancellationToken), but if not we should add one, and then we can avoid these ifdefs littered throughout the code.

while ((read = await stream.ReadAsync(buffer, 0, buffer.Length, ct).ConfigureAwait(false)) > 0)
#else
while ((read = await stream.ReadAsync(buffer.AsMemory(), ct).ConfigureAwait(false)) > 0)
#endif

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

{
if (chunk.Length > 0)
{
buffer.Write(chunk.ToArray(), 0, chunk.Length);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should just be buffer.Write(chunk.Span) and we can polyfill it on netfx if we haven't already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants